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Abstract. The Virtual Observatory (VO) is nearing maturity, and in Spain the Spanish VO 
(SVO) exists since June 2004. There have also been numerous attempts at providing more 
or less encompassing grid initiatives at the national level, and finally Spain has an official 
National Grid Initiative (NGI). In this talk we will show the VO and Grid development 
status of nationally funded initiatives in Spain, and we will hint at potential joint VO-Grid 
use-cases to be developed in Spain in the near future. 
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1. Introduction 

The Virtual Observatory, as proposed by 
Szalay & Gray in [2001 is today a tangi- 
ble infrastructure used by thousands of as- 
tronomers (and non-astronomers) every day. It 
help scientists by letting them explore massive 
amounts of data, to derive statistical proper- 
ties from thousands of objects, and even to find 
new kinds of objects; by providing a multi- 
wavelength view of particular objects or re- 
gions of the sky; and by providing time-lapsed 
views of any region of the sky from archived 
data. 

As mentioned, the foundational paper of 
the Virtual Observatory by Szalay and Gray 
was written in 2001, and the first Astronomical 
Virtual Observatory (AVO) prototype, based 
on CDS's Aladin Sky Atla£]is from 2002 (see 
Padovani[ |2005 ). By that time, our research 
team (the AMIGA group. Analysis of the inter- 



stellar Medium of Isolated G Ataxies, I Verdes-| 
Montenegro et al. 2005 1 ) was working on a 



multi- wavelength study for the 1,051 galax- 
ies in the Catalogue of Isolated galaxies by 
[Karache ntseva in 197 3j(see a lso her online cat- 
alogue,' Karachentseva et al.| 1997 ). The group 
had already compiled quite a large amount of 
information (revised positions, IR fluxes in dif- 
ferent wavelengths, images with diflTerent fil- 
ters), but still needed more information to be 
retrieved, and wanted to make our diflTerent re- 
vised data products available to the commu- 
nity. 

Using the existing International Virtual 
Observatory Alliancq^ (IVOA) Proposed 
Recommendations, such as the VOTABLE 
(Ochse nbein et al. 2004|) and the Simple 
ConeSearch ( [Williams et al.| [2008] ), the 
AMIGA group finally provided a public web 
sita^delivering data in VOTABLE format, pro- 



Send offprint requests to: J.D. Santander-Vela ^ http : //ivoa . net/ 

^ http://aladin.u-strasbg.fr/ ^ http://amiga.iaa.csic.es:8080/DATABASE/ 
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viding at the same time a Simple ConeSearch 
interface. 



1.1. Radio astronomy and the VO 

As astronomy started when the first humans 
raised their heads and reckoned that the sky 
was very similar from night to night, with 
seasonal variations that repeated from year to 
year, the visible part of the spectrum has al- 
ways been the most used, and more familiar, 
to all astronomers. However, the discovery by 
Janskyj in 1933 of a extraterrestrial and extraso- 
lar radiation with a wavelength of 14.6 meters, 
opened a new spectral window that even today 
is just 63 years old, a small fraction of the al- 
most 400 years of instrumental optical astron- 
omy. 

The Virtual Observatory is also developing 
more rapidly for data from the visible parts of 
the electromagnetic spectrum, but radio astro- 
nomical data is of the utmost importance for 
understanding the most distant and obscured 
objects in the universe, and the integration of 
these data sets in the VO infrastructure has 
been a goal of our group since the beginning. 



1.2. Grid and astronomy 



The computing grid was conceived by |Foster| 
I& Kesselman] in [T9991 as a hardware and 
software infrastructure that provides depend- 
able, consistent, pervasive and inexpensive 
access to high-end computational capabili- 
ties. The similarity with the power grid arises 
from the fact that until the power grid be- 
came common-place, electricity based facili- 
ties depended on their own power generation. 
Providing computing-on-demand, as easy to 
access as the power grid, is the aim of grid 
computing initiatives. 

The next generation of astronomical in- 
struments, that we might call surveying in- 
struments, such as the Large Synoptic Survey 
Telescope (LSST), the Square Kilometre Array 
(SKA), the LOw Frequency ARray (LOFAR), 
or even the Atacama Large Millimetre and sub- 
millimetre Array (ALMA), will be extremely 
sensitive, and at the same time their schedule 



will be completely automated, so that the raw 
data rates will overwhelm any conventional 
computing facility. For these telescopes, par- 
allel processing of incoming data is needed, 
either by extensive pre-processing at the data 
generation site, or by running parallel pipelines 
for diff'erent levels of data products, or by split- 
ting the diff'erent levels of processing between 
diff'erent tiers. An extreme, present case, is the 
LOFAR which, as can be seen froml Valentijnf s 
proceeding, will need a super- computer class 
data reduction engine with a tiered architecture 
for data reanalysis. 

At the same time, the storage needs for this 
instruments will be so demanding that for some 
applications no actual data access will be pos- 
sible, but only streaming access to the data be- 
ing provided. Again, the LOFAR is an exam- 
ple of this. In any case, data access has also 
to be distributed, and diff'erent grid techniques 
(grid-FTP, IBM's Grid Parallel File System...) 
are used. 

In the following sections, we will 
learn about the SVO (the Spanish Virtual 
Observatory), grid developments in Spain, and 
joint grid and virtual observatory eff'orts in 
our country. We will end with a conclusions 
section. 



2. VO in Spain: the SVO 

Publicly-funded VO activities in Spain 
are organised aroun d the Spanish Virtua l 



Observatory (SVCH Gutierrez et al. 2006). 



Enrique Solano is the PI of the SVO, which 
joined IVOA in July 2004. With the creation 
of a publicly funded thematic network on 
the Virtual Observatory, the SVO has spurred 
collaborations between all people with interest 
in the VO, from scientists who wanted to 
use VO applications or technologies, to data 
centres wishing to provide VO-compatible 
archives, with groups wanting to publish data 
in the VO in between. 

Nowadays, the SVO provides 4 Full Time 
Employees (FTEs) from the LAEFF-INTA de- 
voted to VO tasks, while the lAA-CSIC pro- 
vides another FTE. The LAEFF-INTA group 
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COROT Ground-Based Seismology Programme Archive (Public Access) 



Fig. 1. Main entry page for the VO-compliant 
GAUDI archive, developed for the preparation 
of the CoRoT mission by the SVO-core. 

constitutes what is known as the SVO-core, 
and its continuity is guaranteed by recurring 
funding by the Instituto Nacional de Tecnica 
Aeroespacial (INTA). 

The SVO-core has developed, and main- 
tains, several VO archives, such as the INESP 
(lUE Newly Extracted Spectra), GAUD^^ 
(CoRoT Ground-based Asteroseismology 
Uniform Database Interface, see 'S olano et al.| 
2Q05| , and OMCQ (INTEGRAL mission 



see 



Gutierrez 



Optical Monitoring Camera, 
et al. 2QQ4| ), among others. 

This experience will be key for the partici- 
pation of the SVO-core in the CONSOLIDER 
consortium for the GTC, both for the VO 
archive of the telescope, but also for the arti- 
ficial intelligence techniques for scientific ex- 
ploitation. 

The SVO maintains presence in diff'erent 
VO-related international bodies and projects: 
Enrique Solano is part of the IVOA Executive, 
and the LAEFF-INTA is member of the 
VOTech, EuroVO-DCA, and EuroVO-AIDA 
EU funded programmes. 

Fostering VO-enabled science, science per- 
formed with VO tools, has always been one of 
the main concerns of the SVO. In that regard, 
two kinds of tools have been developed by the 
VO: ready to use web-based tools, such as the 



^ http://sdc.laeff.inta.es/ines/ 
^ http : //sdc . laef f . inta . es/gaudi/ 
http://sdc.laeff.inta.es/omc/ 



VOSEE0 (see figure [g]), and artificial intelH- 
gence/data mining tools. 

The VOSED provides an interface to 
Simple Spectral Access (SSA, see |Tody &| 
polensky] p007 ), photometry, and synthetic 
spectra services, and even to user provided 
spectra files, so that a user can compile a 
multi- wavelength Spectral Energy Distribution 
(SED) that can later be fitted to simple black- 
body emissions, or to more complex star and 
dust/debris disks models. 

In order to access theoretical spectra, the 
SVO also developed the Theoretical Spectra 



Access Protocol (TSAP, see Rodrigo et al. 



|2QQ7| ), an extension to the SSA with additional 
parameters that allow applications to find out 
supported models, and for spectra to be syn- 
thesised on the fly from them. The VOSED 
uses the TSAP to perform the theoretical spec- 
tra fitting. Additional theoretical related eff'orts 
by SVO members include the PGos^ theo- 
retical model database for stellar populations, 
developed in coordination with the Mexican 
National Institute for Astrophysics, Optics and 
Electronics (INAOE). 

Data mining is another constituent of the 
SVO. The VO allows access to large amounts 
of data, but being able to extract meaning- 
ful properties from huge sets of objects, such 
as classification in diff'erent kinds, or finding 
completely new kinds of objects, is only possi- 
ble by means of data mining techniques, such 
as neural networks. Support Vector Machines, 
k-Means algorithms... One example of neural 
networks use at the SVO is the paper on au- 
tomated classification of eclipsing binaries by 
ISarro et al.l in l2QQ6l 

Several science papers have been published 
by members of the SVO, or using tools devel- 
oped at the SVO. Member s of the SVO take 
part in the VSOP project jDaU et al.| 12007) ) 
for variability-type determination through data 
mining techniques for not yet characterised 
objects, and several serendipitous discoveries 
have been performed, such as that of Albus 1, 
a very bright white dwarf candidate discovered 
by Solano & Caballero in 2007 In figure |3] 



^ http : //sdc . laef f . inta . es/vosed/ 

^ http://ov.inaoep.mx/pgos3/index.php 
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Data Services: 



ne Ultraviolet Explorer Merged Spectra 
traviolet Spectroscopic Explorer 
Simple Spectral Access to HI (21cm) Spectra c 
The INTEGRAL Optical Monitoring Camera 



Simple Spectrum Data Access(HFA SSA) 




V^-K^ [mag] 

Fig. 3. Vt versus Vt - Kg color-magnitude di- 



agram from the data in [Caballero & Solano 
pQQSj ). Tycho-2/2MASS sources with proper 
motions larger and smaller than 15 mas yr"^ 
are shown with crosses and dots, respectively. 
Albus 1 is highlighted with a big filled (red) 
star. 



uvbyli Stromgren photometry: l3 Hauck & Mermilliod 
JHK photometry: l3 2MASS 

Hipparcos Photometry: l3 Hipparcos 

Theoretical Data 

& Models of irradiated accretion dislcs around PMS st 
Fitting to theoretical data 
Bayesian Inference. 



il. 2001, Apj, 553,231). 



User's Data: 



Version 0.98 - March 2007 © LAEFF-INTA 



Fig. 2. Web page of the VOSED search form. 
The user enters an object name, or coordi- 
nates, and selects the diff'erent SSA services 
that can provide spectra for that particular po- 
sition. Photometry data can also be retrieved to 
complete and create a synthetic SED, that the 
user can fit with VOSED to diff'erent models. 



we can see Albus 1 is clearly an outlier in the 
Tycho-2/2MASS color distribution. 

3. Grid in Spain 

Foster's definition of a grid can be posed as 
computing power as easy to use as electric- 
ity. Of course, that means that grid comput- 
ing is a subset of networked computing, and 
any grid computing initiative is only meaning- 
ful when there is a powerful enough network 
linking available computers. 

In Spain, all universities and research in- 
stitutes are connected via a high-bandwidth 
network infrastructure that was known as 
RedlRIS (see figure |4]), and now is known 
as Red.Eq^ a Spanish initiative for a net- 
worked society. RedlRIS was connected to the 
European research network Geant2 in March 
2006, thus providing more bandwidth for re- 
search interoperation in the EU. 

From the networking research aspect of 
RedlRIS, research groups started to promote 
the need for a grid initiative, and thus IRISgrid 
was born in 2003. Several research groups 
had also gained experience by participating 



http : //www . red . es/ 
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Fig. 4. RedlRIS high-level topology. Each re- 
gion has its own networking centre that pro- 
vides service to capital cities, with county ac- 
cess relayed from them. RedlRIS is connected 
in Madrid to the Geant2 network, to the global 
internet, and to Spanish internet through the 
ESR\NIX internet exchange point. Regional 
topology not shown. 



MB, 




Fig. 5. Map of the different research 
groups/institutes initially forming the IRISgrid 
initiative. Groups with an asterisk (*) provided 
computing nodes for the IRISgrid computing 
demonstration on December 2004. 



in the Enabling Grids for E-sciencE (EGEE), 
CrossGrid, EU-DataGrid, Interactive European 
Grid (Int.eu.grid, also known as I2G) and other 
European-level grid projects. The teams in- 
volved in IRISgrid are shown in figure [5] 

Not surprisingly, most of the centres partic- 
ipating in IRISgrid come from the High Energy 
Physics/Particle Physics centres. Spanish ex- 
perience also includes grid middleware de- 



velopment: the Department of Computer 
Architecture and Automation (DACYAp]) at 
the Universidad Complutense de Madrid, has 
developed the GridWa^P] meta- scheduler, now 
part of EGEE's gLitq^middleware distribu- 
tion. 

However, funding for the IRISgrid initia- 
tive ended in the beginning of 2007. Moreover, 
IRISgrid was never officially recognised by 
the Spanish ministry of Science and Education 
as THE national grid initiative. When the 
European Grid Initiativ started to take shape 
in 2007, prompting possible members to reg- 
ister with their own initiatives, finally an offi- 
cially recognised grid initiative was launched: 
the Spanish Network for e-Sciencj^ 

Still, the Spanish Network for e-Science 
consists mostly of the centres participating in 
the IRISgrid initiative, plus centres who wish 
to learn and start using grid techniques, but 
no hardware or budget (aside from contin- 
ued funding for the academic inter-networking 
through Red.es) has been provided. 

In this context, and taking into account 
that many of the research centres with ex- 
tensive grid experience belong to the CSIC 
Spanish Research Agency (Consejo Superior 
de Investigaciones Cientficas), the GRID- 



CSIC initiative has been born this year (Marco 



2008 ) with the aim to provide CSIC centres 
with high-end computing facilities that will 
be networked and provided with EGEE- and 
I2G-compatible middleware so that they can be 
seamlessly integrated in the Spanish NGI. 

This infrastructure must be compatible, 
and eventually shared with the joint Spain- 
Portugal IberGrid initiative, and with the 
France CNRS Institut des Grilles (Grids 
Institute). It will be started in 2008-2009 with a 
test phase with initial deployment of just three 
nodes at centres with grid experience (IFCA, 
IFIC, lAA), with three yet to be defined addi- 
tional nodes in Madrid and Catalonia for 2009- 
2010, and the rest of the nodes operational in 



http://www.ucm.es/info/dacya/ 

http://www.gridway.org/ 

http : //glite . web . cern . ch/glite/ 

http://web.eu-egi.org/ 

http://www.e-ciencia.es/ 
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Fig. 6. Centres participating in the two ini- 
tial phases of GRID-CSIC. The IFCA and IFIC 
belong to the high-energy physics community, 
and IAA is the Institute for Astrophysics of 
Andalusia, and uses grid and super comput- 
ing for planetary atmospheres, stellar structure 
modelling, and N-body simulations, among 
others. The second phase will include addi- 
tional, yet to be defined, nodes at Madrid and 
Catalonia. 



2010 (see figure |6]). Total estimated comput- 
ing power at the end of the project will be in 
the order of 8000 cores (either AMD Barcelona 
or Intel Xeon x86_64 architecture processors), 
with on-Hne storage of up to 1 PB (1000 TB). 

Apart from project coordination and infras- 
tructure management, an additional application 
development and grid porting support area has 
been defined, so that users can make real use of 
the GRID-CSIC equipment. One of the most 
relevant evaluation criteria of the project suc- 
cess will be the percentage of users making ac- 
tive use of the grid for their research. 

IAA experience in IrisGRID comes from 
high-performance computing users in the area 
of astrophysics, and was assigned with the task 
of providing the use cases and background for 
IrisGRID in the topic of astrophysics, while the 
IFCA and IFIC are Tier-2 centres for the pro- 
cessing of LHC data. The IAA plays also a key 
role in the e-Science for Andalusia regional e- 
science initiative, e-CAp^ 



http : //e-ca . iaa . es/ 



4. The intersection of VO and grid in 
Spain 

The VO provides an infrastructure analogous 
to the Data Grid envisioned by [Chervenak 
|et aH and thus allows for parallel access from 
diff'erent nodes at the same time. Grid comput- 
ing within the VO needs to exploit the paral- 
lelism in diff'erent use cases: 

Parameter space exploration: several astro- 
physical simulations, such as stellar struc- 
ture, evolutionary tracks, et cetera, start 
running with a particular set of initial 
parameters, but many runs are required 
to sample a particular parameter space. 
In this case, the gridway meta- scheduler 
can be used to easily gridify this kind 
of applications, running instances of the 
same program that only diff'er in the input 
files/parameters. Many Monte Carlo simu- 
lations fall also in this category. 

Exploration of partitioned data sets: this 
case is very similar to the one above, but 
the parameter space is partitioned instead 
of sampled, allowing instances to run 
independently of each other on each parti- 
tion. Examples of this are massive object 
searches for particular kinds of objects, 
which might need querying large data sets, 
even from many diff'erent archives, but the 
search for properties of those objects can 
be made independently of each region of 
the sky. This partition is not necessarily 
spatial: data reduction pipeline tasks 
partition data based on observation blocks. 

Loosely coupled simulations: the above 
mentioned cases work better because each 
node in the grid does not have to commu- 
nicate with others, or communication is 
restricted at the end and the beginning of 
the process. N-body simulations tend to 
be just in the opposite corner: tracking all 
bodies tends to be more communications- 
bound than processing time bound. 
However, techniques such as Smoothed 
Particle Hydrodynamics, and hierarchical 
N-body simulations are more grid friendly, 
by restricting interaction between grid 
elements to high-level interaction with the 
following level in a hierarchy. 
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Many applications cannot be so easily par- 
titioned. One example are searches of related 
bodies (pairs or triplets of galaxies, for in- 
stance): the partitioning technique can be used, 
but either the partitioning needs overlapping, 
or special border cases have to be considered, 
reducing efficiency in any case. 

In the above discussion of astrophysical ap- 
plications of the grid we have not mentioned 
the Virtual Observatory. There are different as- 
pects of the VO where grid computing might 
be beneficial: 

Visualisation: most of the VO visualisation 
data comes directly from web services. 
However, some of those images can be 
generated on-the-fly by powerful enough 
systems. If the ad-hoc generation tools are 
gridified, the system would make transpar- 
ent use of the grid infrastructure without 
passing the complexity of the system to the 
user. 

Data access: VO protocols such as the 
VOSpace ( [Graham et aL||2QQ8| ) for always 
available storage can be mapped on top 
of the Data Grid by means of imple- 
mentations that make use of protocols 
such as Grid-FTP, IBM's Global Parallel 
FileSystem, et cetera. However, access 
policies still hinder the adoption of these 
protocols. 

Data processing: the VO paradigm is to per- 
form analysis as near to the data as pos- 
sible, in order to reduce network bottle- 
necks. A pervasive grid computing facil- 
ity makes it easier for data providers to al- 
low for sophisticated analysis tools running 
against the data, without having to worry 
about computing resources and scalability. 
There is an IVOA Note on the Common 
Execution Architecture ("Har rison] |2QQ5| ), 
an API for making VO aware analysis tools 
available as web services. This services 
are the facade client applications see, and 
again, the grid can be used in a transparent 
way. 

Data mining: given the rich metadata avail- 
able for VO-compatible data sets, data min- 
ing applications are particularly well suited 
to the VO. Additionally, more and more al- 



gorithms are being deployed in the form 
of web-services, what helps in the devel- 
opment of a distributed infrastructure for 
knowledge extraction. 

Most of grid experience in Spain, even in 
the field of astrophysics, has not been con- 
nected with the Virtual Observatory in any 
way, but more and more users are planning 
to make use of it. High level data analysis 
techniques are being prepared by the AMIGA 
group for 3D data sets, jointly with the SVO- 
core, that will make use of the grid. Groups 
at lAA working on stellar structure and evolu- 
tion are starting to their code bases aware of 
the grid within the e-CA project framework, 
and many data mining projects, such as VSOP, 
will make use of the grid for their processing. 
And the Spanish members of the IVOA Theory 
Interest Group, as part of the SVO, wish to use 
VO tools, and keep developing access proto- 
cols such as TSAP, for micro simulations, i.e. 
not cosmological simulation, but special inter- 
est simulations, such as stellar structures, ini- 
tial mass function distributions, et cetera. 



5. Conclusions 

We have shown that pure Virtual Observatory 
activities in Spain have a very good health, and 
there is ample experience both inside and (spe- 
cially) outside the astrophysics community, but 
more and more services will be deployed on 
the grid as it becomes more pervasive. Spain 
will have in the near future a mature enough 
grid infrastructure within the European Grid 
Initiative, and diff'erent research groups are al- 
ready porting, or planning to port, their code 
bases to the grid, taking as much as possible 
from existing VO infrastructures. 
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